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ABSTRACT 

A new approach to estimating photometric redshifts - using Artificial Neural Networks 
(ANNs) - is investigated. Unlike the standard template-fitting photometric redshift 
technique, a large spectroscopically-identified training set is required but, where one 
is available, ANNs produce photometric redshift accuracies at least as good as and 
often better than the template-fitting method. The Bayesian priors on the underlying 
redshift distribution are automatically taken into account. Furthermore, inputs other 
than galaxy colours - such as morphology, angular size and surface brightness - may 
be easily incorporated, and their utility assessed. 

Different ANN architectures are tested on a semi-analytic model galaxy catalogue 
and the results are compared with the template-fitting method. Finally the method is 
tested on a sample of ^20000 galaxies from the Sloan Digital Sky Survey. The r.m.s. 
redshift error in the range z ^ 0.35 is a z ~ 0.021. 

Key words: surveys - galaxies: distances and redshifts - methods: data analysis 



1 INTRODUCTION 



The basic photometric redshift technique is to use the 
colours of a galaxy in a selection of medium- or broad-band 
filters as a crude approximation of the galaxy's spectral en- 
ergy distribution or SED, in order to find its redshift and 
spectral type. The technique is very efficient compared with 
spectroscopic redshifts since the signal-to-noise in broad- 
band filters is much greater than the signal-to-noise in a 
dispersed spectrum and, furthermore, a whole field of galax- 
ies may be imaged at once while spectroscopy is limited to 
individual galaxies or those that can be positioned on slits or 
fibres. However photometric redshifts are only approximate 
at best and are sometimes subject to complete misidenti- 
fications. For many applications though, large sample sizes 
are more important than precise redshifts and photometric 
redshifts may be used to good effect. 

Photometric redshifts date back to Baum (1962; see 
also Hogg et al. 1998; Weymann et al. 1999). They have 
been used extensively in recent years on the ultra-deep 
and well-calibrated Hubble Deep Field observations (e.g. 
Gwyn & Hartwick 1996; Connolly, Szalay & Brunner 1998 
Fernandez-Soto, Lanzetta & Yahil 1999; Fontana et al. 2000 
Fernandez-Soto et al. 2001; Massarotti, Iovino & Buzzoni 
2001a; Massarotti et al. 2001b). The most commonly used 
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approach is the template-fitting technique. This involves 
compiling a library of template spectra - either theoreti- 
cal SEDs from population synthesis models (e.g. GISSEL - 
Bruzual & Chariot 1993) or empirical SEDs (e.g. Coleman, 
Wu & Weedman 1980) . Then the expected flux through each 
survey filter is calculated for each template SED on a grid 
of redshifts, with corrections for ISM, IGM and Galactic ex- 
tinction where necessary. A redshift and spectral type are 
estimated for each observed galaxy by minimizing % 2 with 
respect to redshift, z, and spectral type, SED, where 



X 2 (*,SED) = ^ 



fi 



Q(z,SED)t,(z,SED) V 

0~i I 



(1) 



fi is the observed flux in filter i, o~i is the error in fi, 
ti(z,SFiD) is the flux in filter i for the template SED at 
redshift z and a(z, SED) (the scaling factor normalizing the 
template to the observed flux) is determined by minimizing 
equation 1 with respect to a, giving 



a(z, SED) = 



^ /A(z,SED) ^ 



I 



i^SED) 2 



•(2) 



The template-fitting photometric redshift technique 
makes use of the available and reasonably detailed knowl- 
edge of galaxy SEDs and in principle it may be used reli- 
ably even for populations of galaxies for which there are few 
or no spectroscopically confirmed redshifts. However, cru- 
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cial to its success, is the compilation of a library of accurate 
and representative template SEDs (see e.g. Hogg et al. 1998; 
Firth 2002b). Empirical templates are typically derived from 
nearby bright galaxies, which may not be truly representa- 
tive of high redshift galaxies. Conversely, while theoretical 
SEDs can cover a large range of star formation histories, 
metallicities, dust extinction models etc., not all combina- 
tions of these parameters (at any particular redshift) are 
realistic, and the ad hoc inclusion of superfluous templates 
increases the potential for misidentifications when using ob- 
servations with noisy photometry. 

An alternative approach can be used when one has a 
sufficiently large (e.g. ~ 100-1000, depending on the redshift 
range) and representative subsample with spectroscopic red- 
shifts. Then one can fit a polynomial or other function map- 
ping the photometric data to the known redshifts and use 
this to estimate redshifts for the remainder of the sample 
with unknown redshifts (e.g. Connolly et al. 1995b; Brun- 
ner, Szalay & Connolly 2000; Sowards-Emmerd et al. 2000). 
With this approach, errors in the estimated redshifts may 
also be estimated analytically or via Monte Carlo simula- 
tions. 

An extension of the latter approach is to use Artifi- 
cial Neural Networks (ANNs hereafter). ANNs have been 
used before in astronomy for, amongst other things, galaxy 
morphological classification (e.g. Storrie-Lombardi et al. 
1992; Nairn et al. 1995; Lahav et al. 1996), morphological 
star/galaxy separation (e.g. Bertin & Arnouts 1996; An- 
dreon et al. 2000) and stellar spectral classification (e.g. 
Bailer- Jones, Irwin & von Hippel 1998; Allende Prieto et 
al. 2000; Weaver 2000). Essentially an ANN takes a set of 
inputs (e.g. logarithms of fluxes - i.e. magnitudes - in differ- 
ent filters) for each object, applies some non-linear function, 
and outputs a value (e.g. the estimated redshift) . The ANN 
is first trained - i.e. the coefficients (weights) of the function 
are optimized - by using a training set where the desired out- 
put is known. The ANN may then be used on any number 
of other objects with similar inputs (i.e. magnitudes in the 
same filter set) but unknown outputs (i.e. redshifts). 

As well as using all of the information contained in the 
magnitudes and colours, provided the training set is a repre- 
sentative subsample of the data, the ANN will also take into 
account the Bayesian priors on the galaxy redshift distribu- 
tion (cf. Benitez 1998; Teplitz et al. 2001). While choosing a 
template library that is both sufficient and non-superfluous 
is a source of concern for the template- fitting method, ANNs 
automatically fit the true range of galaxy SEDs. Another po- 
tential advantage of ANNs relative to the template-fitting 
method is that the weights applied to each filter may be 
more optimal than simple \ -weighting. In addition one can 
also feed in other observational input such as image size or 
surface brightness, morphology and concentration parame- 
ters where such data are available. It is interesting then to 
see how the two methods compare. 

This paper explores the use of ANNs as a potential tool 
for photometric redshift determination. The layout of this 
paper is as follows. In §2 the ANNs are described and in §3 
a semi-analytic model (used to provide a simulated galaxy 
catalogue) is introduced. In §4 the ANN parameters (ar- 
chitecture and training set size) are investigated using the 
simulated galaxy catalogue and in §5 the performance of 
ANNs are compared with the performance of the traditional 
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Figure 1. A schematic diagram of an ANN with input nodes 
taking, for example, magnitudes rrii = — 2.51og 10 /i in various 
filters, a single hidden layer, and a single output node giving, 
for example, redshift z. The architecture is n:p:l in the notation 
used in this paper. Each connecting line carries a weight Wj. The 
bias node allows for an additive constant in the network function 
defined at each node. More complex nets can have additional 
hidden layers. 

template-fitting method. §6 looks at the effect of photomet- 
ric noise and in §7 ANNs are investigated as a method for 
also determining spectral type from redshifted data. In §8, 
ANNs are tested on Sloan Digital Sky Survey observational 
data. The science prospects are briefly discussed in §9. 



2 ARTIFICIAL NEURAL NETWORKS 

An ANN comprises a set of input nodes, one or more out- 
put nodes, and one or more hidden layers each containing a 
number of nodes (Fig. 1; see e.g. Bishop 1995, and references 
therein, for background). A particular network architecture 
may be denoted by Ni n :Ni:N2:...:N out where AT in is the num- 
ber of input nodes, iVi is the number of nodes in the first 
hidden layer, and so on. For example 9:6:1 takes 9 inputs, 
has 6 nodes in a single hidden layer and gives a single out- 
put. The nodes are connected and each connection carries 
a weight which together comprise the vector of coefficients 
w which are to be optimized. Unless otherwise stated, here 
every node is assumed to be connected to every node in the 
previous layer and to every node in the next layer only, but 
it is certainly possible to have more or less interconnected 
nets. The input parameters for each object are represented 
by the vector x (e.g. the magnitudes in a set of filters). 
Given a training set of inputs Xk and desired outputs Zk 
(e.g. the redshift) , the ANN is optimized by minimizing the 
cost function 

^i^T^-FK^)] 2 . (3) 

The function F(w,Xk) is given by the network. A function 
g p is defined at each node p, taking as its argument 

Up^^WjXj (4) 

3 
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where the sum is over the input nodes to p. These functions 
are typically taken (in analogy to biological neurons) to be 
sigmoid functions such as g P {u p ) — 1/[1 + exp(— u p )] (used 
here). An extra input node - the bias node - is automati- 
cally included to allow for additive constants in these func- 
tions. The combination of these functions over all the net- 
work nodes makes up the function F. A programme kindly 
provided by B. D. Ripley was used to train the networks. 
The programme takes as its input a network architecture, a 
training set and a random seed to initiate the weight vec- 
tor , and uses an iterative quasi-Newton method (see e.g. 
Bishop 1995) to minimize the cost function. To ensure that 
the weights are regularized (i.e. that they do not become too 
large), an extra quadratic cost term 

E w =f3±J2wl (5) 

3 

was added to equation 3. A value of /3 = 0.0001 was chosen 
empirically to optimize the ANN performance. After each 
training iteration, the cost function is evaluated on a sepa- 
rate validation set. After a chosen number of training iter- 
ations, training terminates and the final weights chosen for 
the ANN are those from the iteration at which the cost func- 
tion is minimal on the validation set. This is useful to avoid 
over-fitting to the training set if the training set is small. 



3 MODEL GALAXY CATALOGUES 
3.1 Semi-analytic models 

To provide a galaxy catalogue on which to train and test 
ANNs, a semi-analytic model was used. Semi-analytic mod- 
els are an attempt to use simple recipes to parameterize 
the main physical processes of galaxy formation within the 
hierarchical paradigm of galaxy formation (e.g. Kauffmann, 
White & Guiderdoni 1993; Cole et al. 1994). In these models, 
Monte Carlo techniques may be used to efficiently generate 
large mock galaxy catalogues with a (broadly-speaking) re- 
alistic distribution of galaxy types, luminosities, colours and 
redshifts. Here the current version of the code developed by 
Somerville (1997) is used. This has been shown to produce 
good agreement with many properties of local and high- 
redshift galaxies (Somerville & Primack 1999; Somerville, 
Primack & Fabcr 2001; Firth et al. 2002a). 

In this model (see Somerville & Primack 1999 for de- 
tails), the number density of haloes of various masses at a 
given redshift is determined by an improved version of the 
Press-Schechter model (Sheth & Tormen 1999) and the for- 
mation and merging of dark matter haloes as a function 
of time is represented by a 'merger tree'. The cooling of 
gas, formation of stars, and reheating and ejection of gas 
by supernovae within these haloes are modelled by simple 
recipes. Cold gas is assumed to initially cool into, and form 
stars within, a rotationally supported disc. Major merg- 
ers between galaxies destroy the discs and create spheroids. 
Galaxy mergers also produce bursts of star formation. The 
chemical evolution and star formation history of each galaxy 

t The initial weights were randomly chosen from a uniform dis- 
tribution with range [—0.7, 0.7]. 



is traced and convolved with multi-metallicity stellar pop- 
ulation synthesis models (Devriendt, Guiderdoni & Sadat 
1999), and a dust extinction law, in order to calculate the 
galaxy's SED. 

There are several advantages in using a semi-analytic 
model here. Firstly, an arbitrarily large number of galaxies 
may be generated over any desired redshift range, and with 
any magnitude limit. The 'true' redshift and magnitudes 
(prior to the addition of photometric noise) are known pre- 
cisely. At present there is no large observed spectroscopic 
sample at high redshift, and those spectroscopic samples 
that do exist tend to be biased towards luminous galaxies 
with prominent emission lines at optical wavelengths. On 
the other hand, simpler model galaxy catalogues (e.g. PLE 
models) are less likely to produce realistic distributions of 
galaxy SEDs in terms of composite stellar populations, ages, 
metallicities and the effects of dust (all as a function of red- 
shift). Where (when) suitable spectroscopic samples exist, 
the model catalogues could be replaced by observed photo- 
metric and spectroscopic samples and the results would be 
expected to be comparable (see e.g. §8). 

3.2 Preparing the input catalogue 

An H < 22 catalogue in UBVRIH was generated using 
the semi-analytic model. To simulate a real galaxy survey, 
photometric noise was added to this catalogue, simulating 
5a magnitude limits of U = 25.1, B = 26.6, V = 26.1, R = 
25.6, I = 24.7 and H = 20.5 (typical of current and future 
large surveys aimed at studying large-scale structure at high 
redshifts - e.g. the LCIR Survey, Firth et al. 2002a). An 
extra 0.05 mags r.m.s. error term was included to simulate 
seeing variations, zeropoint inaccuracies and other sources 
of photometric errors. Finally an H < 20.5 'noisy' sample 
was drawn from this catalogue. 

So that all weights are treated fairly equally in equation 
5, it is useful to normalize the magnitudes in each filter and 
the model redshifts to the range [0, 1] . The exact form of the 
normalization is unimportant provided the same normaliza- 
tion is used for both training the ANN and using the ANN. 
For definiteness, in each filter the mean magnitude (derived 
from an H < 20.5 'noiseless' sample) was subtracted, and 
the range [—5, 5] was mapped linearly to [0, 1]. Furthermore 
the redshift range [0, 3.5] was mapped linearly to [0, 1]. 



4 SELECTING NETWORK PARAMETERS 

4.1 Number of training iterations 

First of all, the required number of training iterations Mter 
was investigated. Clearly this will depend on the charac- 
teristics of the data set and the complexity of the network 
architecture. There is also an element of chance due to the 
randomized initial weights. After an ANN has been trained, 
its performance is assessed by running a testing set (distinct 
from the training and validation sets) through it and calcu- 
lating the r.m.s. in A z = z mo dei — z P hot, where z mo doi are the 
model redshifts and z p hot are the corresponding ANN pho- 
tometric redshifts. Fig. 2 plots the change in r.m.s. as Ai tcr 
is increased. Three architectures, covering a range in com- 
plexity, are compared and, for each architecture, five ANNs 
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Figure 2. The r.m.s. in A z = z mo dcl — %>hot (measured on 10000 
test galaxies) as a function of the number of training iterations for 
three ANN architectures (indicated in upper right of each panel). 
For each architecture, five ANNs were trained, each initialized 
with a different random seed. A training set of size 10000 was 
used. 



were produced starting with different random seeds. A train- 
ing set of size 10000 was used (cf. §4.3) and the ANNs were 
tested on a separate testing set, also of size 10000. Note that 
the testing set could be any size. A sample of size 10000 was 
chosen simply to provide good statistics. 

The r.m.s. decreases as Mter is increased but levels off 
for large Mter- For most random seeds, improvement beyond 
Alitor ~ 500 is slow. For the remainder of this paper, in which 
a similar range of architectures, and the same - or simpler 
- data sets are considered, Ar iter will be restricted to 1000. 

Fig. 3 compares the estimated photometric redshifts on 
the testing set, using the 6:10:10:10:1 architecture, for two 
different random seeds after 1000 training iterations. The 
two ANNs closely agree. Generally, different initial random 
seeds lead to ANNs with similar r.m.s. accuracy, though 
there are still differences at the JjlO per cent level. It is use- 
ful, therefore, to generate several ANNs using different ran- 
dom seeds. One may then use the validation set to choose 
the 'best' ANN. A better approach (see e.g. Bishop 1995 for 
details) is to combine a set of ANNs generated using dif- 
ferent random seeds. This is called a 'committee of ANNs'. 
In this paper, the estimated redshift for each galaxy pre- 
sented to the committee is taken to be the median of the 
estimates provided by the individual ANNs. Typically the 
committee gives more accurate redshift estimates than any 
of its component ANNs taken individually (see §4.2, Table 
1 for examples). 




ANN redshift - seed 1 

Figure 3. Comparison of photometric redshifts (using 2000 test 
SAM galaxies) for two ANNs initialized with different random 
seeds. A 6:10:10:10:1 network architecture, 10000 training galax- 
ies and 1000 training iterations were used. The r.m.s. in A z = 
z\ — Z2 is 0.065. 



4.2 Network architecture 

More complex network architectures have more free param- 
eters (weights) and therefore allow a closer fit to the data. 
In any real data set there will be a fundamental limit to the 
r.m.s. fit, due to random noise in the input measurements, 
and further increases in architecture complexity will provide 
no significant improvement. In addition, architectures with 
more weights take longer to train. In any given situation 
one would like to use the simplest network possible while 
still obtaining optimal results. 

Table 1 compares the r.m.s. in A z — z m odoi — %>hot, eval- 
uated on a testing set of size 10000, for several architectures 
(see also Fig. 4). For committees of five ANNs, the networks 
with a single hidden layer (i.e. 6:ra:l) reach a limiting r.m.s. 
of ~0.14 (e.g. for m ~ 15). There is no significant improve- 
ment for larger m. However adding extra hidden layers offers 
some improvement, even when the total number of weights 
is not increased. The committees of five 6:6:6:6:1, 6:10:10:1 
and 6:10:10:10:1 ANNs in Table 1 all produce r.m.s. val- 
ues of ~0.115. The scatter in the r.m.s. between different 
committees is of order ±0.002 for these architectures, and 
increasing the number of ANNs in the committee leads only 
to a small improvement in the r.m.s. (e.g. ~0.001 decrease in 
the r.m.s. for a committee of 25 ANNs). Adding further hid- 
den layers (e.g. 6:10:10:10:10:1) leads to little or no further 
improvement. 

For the remainder of this paper, a 6:6:6:6:1 architecture 
will be used as the fiducial ANN, since this architecture gives 
results comparable with the best results in Table 1 but has 
relatively few weights. Unless stated otherwise, redshifts will 
be estimated using a committee of five such ANNs - taking 
the median redshift estimate of five for each galaxy. 

Clearly the required network complexity depends on the 
data set. For data with a higher signal-to-noisc ratio, the 
fundamental r.m.s. limit on A z will be lower and a more 
complex architecture may be necessary to reach this limit. 
Conversely, for data covering a smaller redshift range, fewer 
free parameters will be necessary to model the mapping from 
colours to redshift, and a simpler architecture may suffice. 
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Table 1. The mean and r.m.s. in A z = z mo dcl — 2 phot f° r various network architectures. The simulated SAM catalogue is H < 20.5 
with 5a limits U = 25.1, B = 26.6, V = 26.1, R = 25.6, I = 24.7 and H = 20.5. A training set of 10000 galaxies was used. Columns 3 
and 4 display mean values for five ANNs initialized with different random seeds. Columns 5 and 6 display values for a single committee 
comprising five ANNs (initialized with different random seeds). Note that a committee produces significantly better results than its 
component ANNs. 
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Figure 4. Photometric redshift versus model redshift compar- 
isons (using 2000 test galaxies) for several network architectures 
(indicated at upper left in each panel). The simulated SAM cat- 
alogue is H < 20.5 with 5ct limits U = 25.1, B = 26.6, V = 26.1, 
R = 25.6, I = 24.7 and H = 20.5. A committee of 5 ANNs (for 
each architecture) and a training set of 10000 galaxies were used. 
Note the increased scatter at high-redshift where there are fewer 
training galaxies. The quantization at 0.1 intervals in z is due to 
poor redshift-spacc resolution in the colour grid used to determine 
galaxy colours in the semi-analytic model. 



4.3 Size of training set 

Often one will have no choice concerning the size of the 
training set, iVt ra in- However, when designing surveys, it is 
useful to assess what size of training set is necessary to pro- 
vide a given redshift accuracy. Fig. 5 plots the r.m.s. in 
A z = Zmodcl — Zphot, evaluated on a testing set of size 10000, 
as a function of Af tI - ain , for the 6:6:6:6:1 architecture (using a 
single random seed). The r.m.s. decreases as iVtrain increases 
but begins to level off for large iVtrain values. Clearly there 
is a trade-off - one can obtain fairly good results for quite 
small training sets (e.g. r.m.s. ~0.15 for iVtrain = 1000) but 
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Figure 5. Photometric redshift accuracy as a function of training 
set size Af tra in- The plot shows the r.m.s. in Az = z mo dcl — %>hot! 
evaluated on 10000 test galaxies. The simulated SAM catalogue is 
H < 20.5 with 5cr limits U = 25.1, B = 26.6, V = 26.1, R = 25.6, 
/ = 24.7 and H = 20.5. A 6:6:6:6:1 network architecture (with 
a single random seed) was used. For Nt ra i n < 10000, up to 10 
ANNs were generated using separate training sets; points and 
error bars show respectively medians and interquartile ranges for 
the different ANNs. 

larger training sets can give significant improvements (e.g. 
r.m.s. ~0.12 for Af train = 10000). 

Also plotted in Fig. 5 are the r.m.s. values for differ- 
ent redshift bins: z < 0.5, 0.5 < z < 1.0 and z > 1.0. In 
the testing set, galaxies are distributed between these bins 
in the approximate ratio 2:2:1. Since there are fewer train- 
ing galaxies in the high-redshift bin, the network weights 
are less constrained and the r.m.s. values are larger. From 
A'train = 200 to A?t ra in = 5000 the r.m.s. in the z < 0.5 bin 
drops by about 20 per cent while the r.m.s. in the z > 1.0 
bin drops by more than 50 per cent. Thus larger training sets 
are important for tying down the redshifts of rare objects, 
but if this is not of particular interest (e.g. many large-scale 
structure surveys) then one can manage with smaller train- 
ing sets. 

It is important to note that the size of A/train required to 
achieve a given accuracy depends on the variation inherent 
in the data set. In particular, for photometric redshifts it 
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Figure 6. A comparison of ANN and hyperz photometric red- 
shifts with model rcdshifts on 2000 test galaxies. The simulated 
SAM catalogue is H < 20.5 with 5cr limits U = 25.1, B = 26.6, 
V = 26.1, R = 25.6, I = 24.7 and H = 20.5. A committee of five 
6:6:6:6:1 ANNs and a training set of 10000 galaxies were used. 
Eight Bruzual & Chariot GISSEL'98 evolving SEDs were used as 
templates in hyperz with Ay in the range 0.0-1.2. 

depends on the redshift range. In §5 — §7, in which the same 
H < 20.5 SAM catalogue is used, the training set size will be 
fixed to 10000. For low redshift /shallow surveys (e.g. SDSS 
- see §8), smaller training sets should suffice. 



5 COMPARISONS WITH THE TEMPLATE- 
FITTING METHOD 

Fig. 6 compares the model redshifts of 2000 test galax- 
ies with the redshifts estimated using a committee of five 
6:6:6:6:1 ANNs and a training set of 10000 galaxies. As 
found in §4, the r.rn.s. in A z = z mo dci — %>hot is 0.12 over 
the redshift range < z < 3.5. Of course, the perfor- 
mance depends entirely on the particular set of filters and 
limiting magnitudes^ so, for comparison, the results of the 
template-fitting code hyperz^ (Bolzonella, Miralles & Pello 
2000) are also shown (using the 8 synthetic Bruzual & Char- 
lot 1993 GISSEL ' 98 evolving templates, that are distributed 
with hyperz, and dust extinction Ay in the range 0.0-1.2). 
The hyperz results are comparable but slightly worse. For- 
mally the r.rn.s. in A z is 0.26, but this relatively high value 
may be due more to the small number of complete misiden- 
tifications - against which the ANN seems to be more robust 
- than to a general increased scatter. However, as a further 
comparison, the scatter measured by the half-range of the 
central 68 per cent A z values is 0.02 for the ANN and 0.11 
for hyperz. 

It may be argued that hyperz is at a disadvantage here, 
relative to the ANN, since the ANN training and testing sets 
are based on the same spectral models while hyperz is try- 
ing to fit a different set of templates. For a real data set, 
hyperz - like all template-fitting methods - could still suf- 
fer from template mismatches while ANNs automatically fit 
the data. Synthetic templates were used in hyperz for the 
above comparison since synthetic templates, albeit from a 

t The sample catalogue is of much poorer quality than the Hub- 
ble Deep Field, for example. 
§ http : //webast . ast . obs-mip . f r/hyperz/ 
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Figure 7. A comparison of ANN and hyperz photometric red- 
shifts with model rcdshifts on 2000 test galaxies. The simulated 
catalogue is H < 20.5 with 5<r limits U = 25.1, B = 26.6, 
V = 26.1, R = 25.6, I = 24.7 and H = 20.5. Prior to adding 
photometric noise, the photometry for each galaxy in the semi- 
analytic model catalogue was replaced with the photometry of the 
best-fitting of four empirical CWW template spectra (E, Sbc, Scd 
or Im) at the same redshift. A committee of five 6:6:6:6:1 ANNs 
and a training set of 10000 galaxies were used. The 4 CWW SEDs 
themselves were used in hyperz. 

different source, are used in the semi-analytic model. How- 
ever when using hyperz on real data sets, empirical SEDs, 
such as the four Coleman et al. (1980, CWW) SEDs - E, 
Sbc, Scd and Im - distributed with the hyperz code, often 
produce better results. It is possible that the four CWW 
templates are a closer match to real galaxy SEDs than 
the eight GISSEL'98 evolving templates are to the semi- 
analytic model SEDs. Therefore as a further comparison, 
which maximally favours hyperz, the UBVRIH photometry 
for each galaxy in the semi-analytic model was replaced by 
UBVRIH photometry for the best-fitting (using rest-frame 
B — I colour) CWW template SED at the same model red- 
shift. Noise was added to the new photometry as described 
above and an H < 20.5 sample was reselected. The magni- 
tudes and redshifts were normalized as described above and 
a new committee of five 6:6:6:6:1 ANNs was generated. Fig. 
7 compares the new ANN with hyperz. The formal r.rn.s. 
in A z is 0.10 for the ANN and 0.12 for hyperz, while the 
half-width of the central 68 per cent A z values is 0.02 for the 
ANN and 0.04 for hyperz. Hence, even for this case maxi- 
mally favouring hyperz, the ANN performs at least as well 
as template-fitting. 



6 SCATTER DUE TO PHOTOMETRIC 
ERRORS 

It is of interest to see what the distribution of estimated red- 
shifts is for a particular galaxy, as a result of random photo- 
metric errors. Random noise was added, as described above, 
to a selection of individual model galaxies, with 1000 random 
simulations per galaxy. Then redshifts were estimated us- 
ing the above previously trained committee of five 6:6:6:6:1 
ANNs. Fig. 8 plots histograms of the estimated redshifts for 
each original galaxy. The width of the P(z) distribution in- 
creases for fainter galaxies but a more pronounced trend is 
that it increases for galaxies at higher redshifts. There are 
fewer training galaxies at high redshifts which means that 
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Figure 8. Eight galaxies were selected from the original 'noise- 
less' SAM catalogue. Random photometric noise was added sim- 
ulating 5<t limiting magnitudes U = 25.1, B = 26.6, V = 26.1, 
R = 25.6, I = 24.7 and H = 20.5, with 1000 simulations per ini- 
tial galaxy. The plots show histograms (one for each of the original 
eight galaxies) of the redshifts estimated with the committee of 
five ANNs previously trained on the full 'noisy' H < 20.5 training 
set (as in §5). The upper four panels are for bright (H ~ 19.5) 
galaxies while the lower four panels are for faint (H ~ 20.5) galax- 
ies. The arrows indicate the true (model) redshift of each galaxy. 

(a) the network weights are less constrained than at lower 
redshifts, and (b) the training process gives more weight to 
improving the accuracy for the majority of galaxies at low 
redshifts at the expense of accuracy at high redshifts. 



7 SPECTRAL TYPE CLASSIFICATION 

One can also use ANNs to determine spectral type (indepen- 
dently of redshift) provided, as emphasized above, the train- 
ing set is representative in terms of both spectral types and 
redshifts (and hence galaxy colours). As in §5, the photome- 
try for each semi-analytic model galaxy was replaced by that 
for the best-fitting CWW template SED at the same model 
redshift. Then simulated photometric noise was added and 
an H < 20.5 sample was selected. The best-fitting CWW 
spectral types E, Sbc, Scd and Im give the desired output 
on which the network is trained. 

Since the input colour data is the same as in the photo- 
metric redshift problem of §4, a similar network architecture 
was used. However it was modified to produce four outputs 
(viz. 6:6:6:6:4), corresponding to the four spectral types (at 
any redshift)^. When training, the desired output is 1 for the 
output node corresponding to the correct type and for the 
other three nodes. When a galaxy of unknown spectral type 
is run through the ANN, the output in each node may be 
treated approximately as a probability for the galaxy being 
of the corresponding type, and the galaxy is assigned the 
type for which the probability is greatest (cf. e.g. Storrie- 
Lombardi et al. 1992; Lahav et al. 1996). The input mag- 
nitudes were normalized to the range [0, 1] as above, 10000 

• Alternatively, since galaxy spectral types roughly follow a se- 
quence (e.g. Connolly et al. 1995a; Nairn et al. 1995), one could 
utilize a network with a single output node to classify galaxies, 
for example, on a scale of to 1, where corresponds to spectral 
type E and 1 corresponds to spectral type Im. 



training objects were used, and a committee of five 6:6:6:6:4 
ANNs was generated. 

The ANN results are displayed in Table 2. The ANN 
spectral types agree very well with the original CWW spec- 
tral types - the mean error rate is ~1 per cent. Table 2 
also shows the equivalent hyperz results. Here the ANN and 
hyperz perform comparably well. 



8 PERFORMANCE OF NEURAL NETWORKS 
ON SDSS DATA 

The Sloan Digital Sky Surveyll (SDSS; York et al. 2000) con- 
sortium have now publicly released more than 50000 spec- 
troscopic redshifts along with ugriz photometry and various 
image morphological parameters. These provide an excellent 
opportunity to test ANNs on real data (see also Sowards- 
Emmerd et al. 2000 for a polynomial- fitting approach). Ob- 
jects were selected from the SDSS public data set using the 
following criteria: (1) the spectroscopic redshift confidence 
must be greater than 0.95 and there must be no warning 
flags, (2) r < 17.5, (3) redshift < 0.5. Stars were left in with 
the galaxies but at these magnitudes they could have been 
fairly robustly removed using image morphology. The order 
was randomized and the magnitudes, redshifts and other 
parameters were normalized to the range [0, 1]. 

Because the SDSS redshift range is much smaller than 
that of the SAM catalogue used in §4 (though note the addi- 
tion of stars here), a simpler architecture may suffice. How- 
ever, for the sake of simplicity, a similar architecture is used 
in this section also. Two ANN architectures were used. One 
- 5:6:6:6:1 - inputing ugriz photometry, and the other - 
8:6:6:6:1 - inputing ugriz photometry and the SDSS pipeline 
star/galaxy classifier ('type') and Petrosian 50 per cent and 
90 per cent r-band flux radii, rso and rg . A training set of 
size 10000 was used and, for each architecture, a committee 
of five ANNs was generated. 

Fig. 9 compares the ANN redshifts with spectroscopic 
redshifts for a testing set of 7000 galaxies. The r.m.s. in 
A z are 0.023 and 0.021 for committees of five 5:6:6:6:1 and 
8:6:6:6:1 networks respectively, while the mean offsets are 
both 0.000. These results are easily as good as, and probably 
better than, the best results that template-fitting photomet- 
ric redshift methods can produce. There are also very few 
outliers. The spike at z = in the 5:6:6:6:1 network results 
is due to misidentified stars. In the 8:6:6:6:1 network the 
addition of morphological parameters largely removes this 
feature. 
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The Participating Institutions are The University of Chicago, 
Fermilab, the Institute for Advanced Study, the Japan Partici- 
pation Group, The Johns Hopkins University, the Max-Planck- 
Institute for Astronomy (MPIA), the Max-Planck-Institute for 
Astrophysics (MPA), New Mexico State University, Princeton 
University, the United States Naval Observatory, and the Uni- 
versity of Washington. 
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Table 2. Comparisons of the efficiency with which ANNs and hyperz recover galaxy spectral types. The simulated catalogue is H < 20.5 
with 5(7 limits U = 25.1, B = 26.6, V = 26.1, R = 25.6, / = 24.7 and H = 20.5. A committee of five 6:6:6:6:4 ANNs and a training set 
of 10000 galaxies were used. 
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Figure 9. A comparison of photometric and spectroscopic red- 
shifts using SDSS public data. Two ANN architectures were used, 
taking as input ugriz photometry (5:6:6:6:1 architecture, left) and 
ugriz photometry, SDSS star/galaxy classifier and Petrosian 50 
and 90 per cent r-band flux radii (8:6:6:6:1 architecture, right). A 
training set of size 10000 was used. The ANNs were tested on a 
separate testing set of size 7000 (plotted). In each panel, redshift 
estimates are medians from a committee of five ANNs. 

While 10000 training objects were used for the above 
networks, one can still do fairly well with much smaller train- 
ing set sizes. For example, with only 500 training objects, 
the r.m.s. values become respectively 0.028 and 0.027 (cf. 
Fig. 5). Again, larger training sets are expected to be useful 
for pinning down the classifications of rare objects - e.g. the 
r.m.s. in the redshift range 0.25 < z < 0.35 is ~0.03 for 
iVtrain = 10000 but degrades to -0.06 for N tia in = 500. 



include > 100000 redshifts to Iab = 22.5, > 40000 redshifts 
to Iab = 24 and > 1000 redshifts to Iab = 25, providing 
ample training set sizes for the complementary deep imaging 
in UBVRIK S to Iab = 25 (similar to the limits used in §3). 

With careful modelling of photometric errors and some 
loss in the Bayesian statistics, bright spectroscopic samples 
may also be extrapolated to provide training sets for fainter 
photometric samples. 

In addition, ANNs may also be used where spectro- 
scopic redshifts are unavailable, by utilizing a simulated cat- 
alogue (e.g. semi-analytic model) as a training set. By using 
theoretical SEDs in the training set, this method has all the 
disadvantages and advantages of standard template-fitting, 
but it also has the extra advantages (i) the 'template' SEDs 
include a (more or less) realistic distribution of complex star 
formation histories, dust modelling and metallicities etc., 
giving fully Bayesian statistics, and (ii) the weights applied 
to different filters (and non-linear combinations thereof) 
may be more optimal than simple x 2 " we ighting. 

To conclude, while template-fitting photometric red- 
shifts may be used to good effect in pioneering studies of 
new populations of objects, spectroscopic confirmation will 
always be necessary to obtain truly robust scientific results. 
Instead the real power of photometric redshifts lies in ex- 
tending small very resource- intensive faint spectroscopic sur- 
veys to much larger fields-of-view and sample sizes. That is 
to say, the area where photometric redshifts can best be used 
for robust and useful scientific gains is in the training-set 
regime. ANNs provide a powerful tool for obtaining high- 
quality photometric redshifts in such surveys. 



9 CONCLUSIONS 

ANNs can produce photometric redshift accuracies that are 
comparable to or better than template-fitting procedures. 
However they do rely on large and representative training 
samples and an ANN is only applicable to the particular 
survey filters and redshift range upon which it has been 
trained. For large photometric/spectroscopic surveys, such 
as the SDSS and future deeper surveys such as DEEP2** 
and the VIRMOS-VLT Deep Surveytt (VVDS; Le Fevre et 
al. 2000), where large spectroscopic samples are available, 
it seems that ANNs offer some significant advantages over 
previous approaches. The VVDS, for example, is expected to 

http : //astron . berkeley . edu/~marc/deep/ 
tt http : //www . astrsp-mrs . f r/virmos/ 
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